Reliability-oriented resource management for High-Performance Computing

نویسندگان

چکیده

Reliability is an increasingly pressing issue for High-Performance Computing systems, as failures are a threat to large-scale applications, which even single run may incur significant energy and billing costs. Currently, application developers need address reliability explicitly, by integrating application-specific checkpoint/restore mechanisms. However, the alone cannot exploit system knowledge, not case system-wide resource management systems. In this paper, we propose reliability-oriented policy that can increase significantly component combining mechanisms exploitation proactive policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contributions for Resource and Job Management in High Performance Computing Contributions for Resource and Job Management in High Performance Computing

High Performance Computing is characterized by the latest technological evolutions in computing architectures and by the increasing needs of applications for computing power. A particular middleware called Resource and Job Management System (RJMS), is responsible for delivering computing power to applications. The RJMS plays an important role in HPC since it has a strategic place in the whole s...

متن کامل

Auction Oriented Approach for Resource Management in Grid Computing

Grid computing, emerging as a new paradigm for next-generation computing, enables the sharing, selection, and aggregation of geographically distributed heterogeneous resources for solving large-scale problems in science, engineering, and commerce. The resources in the Grid are heterogeneous and geographically distributed. The paper demonstrates the capability of economicbased systems for wide-a...

متن کامل

High-Performance Computing for Asset-Liability Management

Financial institutions require sophisticated tools for risk management. For company-wide risk management both sides of the balance sheet should be considered, resulting in an integrated asset liability management approach. Stochastic programming models suit these needs well and have already been applied in the eld of asset liability management to improve nancial operations and risk management. ...

متن کامل

The mayo high performance teamwork scale: reliability and validity for evaluating key crew resource management skills.

PURPOSE To develop and evaluate a participant rating scale for assessing high performance teamwork skills in simulation medicine settings. METHODS In all, 107 participants in crisis resource management (CRM) training in a multidisciplinary medical simulation center generated 273 ratings of key CRM skills after participating in two or three simulation exercises. These data were analyzed using ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Sustainable Computing: Informatics and Systems

سال: 2023

ISSN: ['2210-5379', '2210-5387']

DOI: https://doi.org/10.1016/j.suscom.2023.100873